Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fud2] update path finding to support multi state input and output #2134

Merged
merged 38 commits into from
Jul 8, 2024

Conversation

jku20
Copy link
Collaborator

@jku20 jku20 commented Jun 11, 2024

This PR begins progress towards addressing #1958. It updates the algorithm used by fud2 for finding paths from input files to output files to take in operations which have many inputs and outputs.

The behavior concurs with the current fud2 in some but not all cases and does not maintain fud2's always finding the shortest path of operations.

Specifying operations themselves and emitting the Ninja file still assume single state inputs and outputs. This will be changed in further PRs.

@jku20 jku20 marked this pull request as draft June 12, 2024 00:31
@sampsyo
Copy link
Contributor

sampsyo commented Jun 12, 2024

Awesome! I think this is headed in the right direction as a first step. We obviously are already discussing more sophisticated approaches here in #2113, but I guess I am still optimistic that your approach here can work as a stepping stone?

The behavior concurs with the current fud2 in some but not all cases and does not maintain fud2's always finding the shortest path of operations.

Sorry to be somewhat slow on this, but I'm having trouble crystallizing the exact cases where this is true. Would it be possible to jot down a couple of examples (even if they are contrived)? That might help me understand and forget less quickly this time. 🤪

Specifying operations themselves and emitting the Ninja file still assume single state inputs and outputs. This will be changed in further PRs.

Great idea to defer this to a future change.

@jku20
Copy link
Collaborator Author

jku20 commented Jun 13, 2024

oops, probably should have given an example.

One example where behavior disagrees is on a state with a single op from itself to itself. The request asks to go from the state to itself through the given op.

The old algorithm would handle this perfectly fine. The new algorithm as implemented would say no path found because it cannot have two files of the same (initial) state. (Sidenote: the specific cased used, self loops with one input and output, can be hacked in as a special case to fix this, but I wouldn't call having a bunch of special cases like this flexible or "good")

Another example is one state with two paths to another state. One path is 2 ops long and another path is 1 op long. Assume a request from the first state to the latter and no --through argument.

The old algorithm would always pick the 1 op long path as it is shorter. The new algorithm however may pick the 2 op long path as it searches through ops in an arbitrary order.

@jku20 jku20 marked this pull request as ready for review June 19, 2024 04:28
@jku20 jku20 requested a review from sampsyo June 28, 2024 07:16
Copy link
Contributor

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for your patience here—this took me a bit longer to review than I expected. But there is a lot of good stuff in here and I had a good time reading it over!

To summarize my view on this PR, I think it contains 3 main things:

  1. The general scaffolding to support multi-input/-output ops everywhere (i.e., the driver data structure is now a hypergraph).
  2. The new plan-search algorithm using exhaustive enumeration.
  3. A new approach to filename generation that is suited to our now more-complex plans.

Here is my current status on each element after spending some time with this PR:

  1. I understand the scaffolding stuff completely, and I left detailed comments that are mostly low-level code/docs stuff. Looking really good in general!!
  2. I do not really understand the new search algorithm yet, even though it's actually not very much code. The combination of recursion and mutation was too much for me, at least in this review session and with this level of documentation. This is some pretty knotty stuff that I may need your help to fully grok.
  3. I mostly understand the name-generation approach, but not 100%. The issue is that this seemingly-simple mechanism has suddenly gotten kinda fundamentally complicated in a way that may need further care.

I definitely don't want to hold up item 1, which is pretty close to be good to go, while I continue to wrestle with item 2 (and to some degree item 3). So I have an idea about how to proceed strategically:

  • First, what if we remove the new search algorithm (item 2) from this PR and return to the old DFS thing? We would just slip a couple of assert!s or whatever into the DFS-based code so we panic whenever we encounter any op with more than 1 input or output. But at least the functionality would remain identical and we could get the rest of this PR merged.
    • There is probably no need to similarly separate out the name-generation stuff into a different PR, but we could do it if we really want.
  • Second, we do what we've discussed a bit on Zulip/in person and separate out the search algorithm into a separate module with a minimal interface so it is easily replaceable.
  • Third, and finally, we reintroduce your enumerative search algorithm alongside the current DFS-based search. This will make it possible to do a thing you mentioned at one point: cross-validate the two implementations against each other, either based on a compile-time option or even a very funky CLI flag.

This strategy would not only let me more properly review the exhaustive search algorithm without holding up the "item 1" refactoring; it would also mean that we would just matter less in general if search algorithms are kinda hard to understand while they are works in progress. What do you think; do you think breaking things down in this way would be feasible?


Also, to call out one high-level suggestion from the comments: would it simplify things if we assumed there was only 1 stdin and 1 stdout file per Request? This seems certain to be true from a UI perspective, and it could alleviate some knottiness at various points.

/// Find a chain of Operations from the `start` state to the `end`, which may be a state or the
/// final operation in the chain.
fn find_path_segment(
/// Return parents ops of a given state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment (and possibly name) could be a little clearer: find the set of ops that have this state as an output.

fud2/fud-core/src/cli.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/cli.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/cli.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/cli.rs Show resolved Hide resolved
fud2/fud-core/src/exec/driver.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/exec/driver.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/exec/driver.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/exec/driver.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/exec/driver.rs Outdated Show resolved Hide resolved
@jku20
Copy link
Collaborator Author

jku20 commented Jul 1, 2024

Thanks for the thorough review!

The steps you describe make sense. Currently working on the "scaffolding" part of the PR.

On your suggestion: I don't think having only multiples files as input from stdin or multiple as output to stdout is a big source of difficulty or really much different from a single in and a single out. It's just a bit of a poor implementation right now which I need to change. At a very high level, the problem feels like goes into and comes out of gen_name is confused. I think rewriting/rethinking and documenting it and some of the surrounding code (e.g. the IO enum) will help a lot of this.

@jku20
Copy link
Collaborator Author

jku20 commented Jul 3, 2024

The new commits do the following:

  • refactor gen_name as that was particularly poorly implemented.
  • move find_path from driver into path
    • add flag to choose between new and old implementations
    • reify find_path as trait objects stored in Request and chose objects based on the cli (I think that's the right way to describe the implementation?)
  • improve documentation
  • undo bullet 2 by replacing the new find_path implementation with a todo!()
    • remove tests for the new find_path algorithm

Question: As mentioned in the final bullet, I've currently settled on replacing the find_path function in EnumeratePathFinder with a todo!() and copying the old code over in a new PR when that happens. This is to separate the new path finding algorithm into a new PR. Is there a better way to do this?

@jku20 jku20 requested a review from sampsyo July 3, 2024 06:07
@sampsyo
Copy link
Contributor

sampsyo commented Jul 4, 2024

Awesome; I will attempt to take a look as soon as I can!

Question: As mentioned in the final bullet, I've currently settled on replacing the find_path function in EnumeratePathFinder with a todo!() and copying the old code over in a new PR when that happens. This is to separate the new path finding algorithm into a new PR. Is there a better way to do this?

That seems like a great tactic to me!! Yay! I see no problems with leaving a todo!() in place, since it doesn't seem to break the existing codepath.

Copy link
Contributor

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wahoo! This looks fantastic! I only found a few super minor things to comment on. Feel free to hit the big green button whenever you want, after either incorporating or ignoring my comments. 😃

Next, I suppose we can do a separate PR with the enumerative search thingy?

fud2/fud-core/src/cli.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/exec/path.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/exec/request.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/exec/request.rs Outdated Show resolved Hide resolved
fud2/fud-core/src/run.rs Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this empty file should be deleted?

fud2/fud-core/src/exec/driver.rs Outdated Show resolved Hide resolved
@jku20 jku20 merged commit 259c2c8 into main Jul 8, 2024
18 checks passed
@jku20 jku20 deleted the fud2-multi-input-output-find-path branch July 8, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants